Code Visibility and Accessibility

The R code chunks in this report are hidden by default. The Code drop down at the beginning of the report allows for all chunks to be shown at once for better readability. Alternatively, individual code sections can be shown by clicking the Code button above each chunk. Clicking the button again will hide the code.

Acknowledgement

Grammarly, an AI-powered writing assistant tool, was utilised to enhance the clarity, coherence, and grammatical accuracy of initial notes and the final draft. Its assistance in identifying errors and suggesting style and vocabulary improvements significantly contributed to the quality of this report.

Executive Summary

This report presents the findings of a comprehensive analysis and evaluation of various forecasting methods applied to time series data from the M3 competition data set. The project aimed to explore the efficacy of different statistical models in accurately predicting future values of quarterly time series data, both through manual forecasting for individual series and batch forecasting for a range of series.

Through meticulous analysis and comparison, it became evident that the regression model stood out as the most accurate method in manual forecasting, surpassing both exponential smoothing with the Holt-Winters method and ARIMA models in terms of accuracy metrics. Moreover, during batch forecasting, evaluations of automatic ARIMA, automatic ETS, and TBATS models against Theta and Damped Exponential Smoothing models failed to surpass the latter in performance.

Introduction

Forecasting is a vital tool for decision-making across diverse domains. It enables organisations to anticipate future trends, allocate resources efficiently, and mitigate risks effectively. In practice, accurate forecasting is indispensable for strategic planning, inventory management, and market analysis.

This report delves into the practical aspects of forecasting by evaluating different statistical models applied to time series data. Time series forecasting, encompassing manual and batch approaches, is instrumental for businesses and organisations to optimise operations and capitalise on emerging opportunities.

The report focuses on analysing and forecasting quarterly time series data from the M3 competition data set, spanning various sectors such as finance, economics, and demographics. Each time series represents a historical record of a specific variable, such as sales numbers, stock prices, or production levels, recorded at regular intervals—yearly, quarterly, or monthly. By leveraging manual forecasting techniques for individual series and batch forecasting methods for multiple series simultaneously, this study aims to provide insights into the effectiveness of different forecasting methodologies.

According to Koning, Franses, Hibon and Stekler (2005), four main conclusions of the M3 competition were derived from descriptive statistics analysis with no formal statistical testing. First, the complexity of forecasting methods does not always correlate with forecast accuracy; simpler methods can be equally effective. Second, the performance rankings of different methods vary depending on the accuracy measure employed. Third, combining various forecasting methods tends to yield better results than using individual methods alone, demonstrating superior performance overall. Lastly, the effectiveness of forecasting methods is influenced by the length of the forecasting horizon.

Manual Modelling

Exploratory analysis

In this section, we focus on series ID 1394 from the M3 forecasting competition, which tracks quarterly demographic data, specifically unemployment in Canada. This time series is valuable for analysts and researchers studying labour market trends in Canada and could aid in developing forecasting models to predict future unemployment based on historical patterns. We assume the data has been adjusted to represent per capita data. A summary of series 1394 is provided below.

# Load the data
frequency <- 4
data <- M3[[1394]]
data <- subset(data, frequency == frequency)

# Assign the historical data to a variable for training the model
training_data <- data$x

# Assign the future data points for testing the model's predictions
test_data <- data$xx

{ cat("M3 competition series ID 1394\n"); print(data) }
M3 competition series ID 1394
$st
[1] "Q749"

$type
[1] "DEMOGRAPHIC"

$period
[1] "QUARTERLY"

$description
[1] "UNEMPLOYMENT- CANADA"

$sn
[1] "N1394"

$x
     Qtr1 Qtr2 Qtr3 Qtr4
1962 5610 3730 2820 3460
1963 5460 3720 2720 3050
1964 4630 3260 2430 2660
1965 3970 2980 2100 2140
1966 3520 2580 2260 2330
1967 3920 3200 2500 2990
1968 4780 3990 3170 3330
1969 4630 4000 3150 3500
1970 5180 5290 4550 4780
1971 6640 5840 4680 4930
1972 6450 5710 5020 5300
1973 6500 5220 4380 4700

$xx
     Qtr1 Qtr2 Qtr3 Qtr4
1974 6240 5200 4480 5070
1975 8320 7380 6210 6380

$h
[1] 8

$n
[1] 48

Within the context of the M3 competition data set, ‘x’ serves as the historical (a.k.a. training) time series data from 1962 to 1973, which is used to develop and calibrate forecasting models, while ‘xx’ acts as the future (a.k.a. test) data set from 1974 to 1975, which is used to evaluate the performance and accuracy of the models’ predictions.

Producing a time series plot of the historical data (Figure 1) and its summary will help gain insights into the data’s characteristics.

# Generate a time series plot of the training data
autoplot(training_data, 
         type = "l",        
         xlab = "Year",      
         ylab = "Unemployment",  
         main = "Quarterly unemployment in Canada in 1962-1973 (Series ID 1394)") +
    theme_tufte(base_family = "Helvetica") +                 
    theme(
      plot.title = element_text(hjust = 0.5),                 
      legend.position = "top"                                 
    )
Quarterly unemployment in Canada in 1962-1973

Figure 1: Quarterly unemployment in Canada in 1962-1973

kableExtra::kbl(glance(summary(training_data)), digits = 2, caption = "Summary of the quarterly unemployment in Canada in 1962-1973")|>
  kable_styling(
    position = "center",
    full_width = FALSE
  )
Table 1: Summary of the quarterly unemployment in Canada in 1962-1973
minimum q1 median mean q3 maximum
2100 3035 3945 4036.67 4952.5 6640

The series exhibits a downward trend before 1966, followed by a clear upward trend starting in 1966, indicating an increase in the unemployed since then. A strong seasonal pattern is evident in Figure 2, with peaks and troughs corresponding to particular times of the year (Q1 and Q3, respectively) (Figure 3), suggesting likely annual seasonality influenced by various factors related to economic activity, societal habits, and institutional schedules.

ggseasonplot(training_data, year.labels=TRUE, year.labels.left=TRUE) +
  ylab("Unemployment") +
  ggtitle("Seasonality: quarterly unemployment in Canada")+
    theme_tufte(base_family = "Helvetica") +                 
    theme(
      plot.title = element_text(hjust = 0.5),                 
      legend.position = "top"                                 
    )
Seasonality plot of the quarterly unemployment in Canada in 1962-1973

Figure 2: Seasonality plot of the quarterly unemployment in Canada in 1962-1973

ggsubseriesplot(training_data) +
  ylab("Unemployment") +
  ggtitle("Seasonal subseries: quarterly unemployment in Canada")+
    theme_tufte(base_family = "Helvetica") +                 
    theme(
      plot.title = element_text(hjust = 0.5),                 
      legend.position = "top"                                 
    )
Seasonal subseries plot of the quarterly unemployment in Canada in 1962-1973

Figure 3: Seasonal subseries plot of the quarterly unemployment in Canada in 1962-1973

The lagged scatter plots of the quarterly Canadian unemployment (Figure 4) reveal a strongly positive relationship at lag 4, reflecting the strong seasonality in the data.

gglagplot(training_data) +
  ggtitle("Lagged scatter plot: quarterly unemployment") +
    theme(legend.position = "bottom",
          axis.text.x = element_text(angle = 45, hjust = 1)) +
    theme_tufte(base_family = "Helvetica") +                 
    theme(
      plot.title = element_text(hjust = 0.5),                 
      legend.position = "top"                                 
    )
Lagged scatter plots of the quarterly unemployment in Canada in 1962-1973

Figure 4: Lagged scatter plots of the quarterly unemployment in Canada in 1962-1973

As the data are both trended and seasonal, we observe the slow decay in autocorrelation associated with the trend component alongside more significant spikes at lags matching the seasonal period in Figure 5.

ggAcf(training_data, main = "ACF: unemployment in Canada in 1962-1973")+
    theme_tufte(base_family = "Helvetica") +                 
    theme(
      plot.title = element_text(hjust = 0.5),                 
      legend.position = "top"                                 
    )
Autocorrelation Function (ACF) plot of the quarterly unemployment in Canada in 1962-1973

Figure 5: Autocorrelation Function (ACF) plot of the quarterly unemployment in Canada in 1962-1973

As the data exhibits both trend and seasonality, decomposition can be a valuable insight by separating the time series into trend, seasonality, and residual components. Upon decomposing the time series, the multiplicative type of decomposition effectively subtracts the seasonal and trend-cycle components from the training data in Figure 6.

training_decomposed <- decompose(training_data, type = "multiplicative")
autoplot(training_decomposed)+xlab("Year")+
    theme_tufte(base_family = "Helvetica") +                 
    theme(
      plot.title = element_text(hjust = 0.5),                 
      legend.position = "top"                                 
    )
Decomposed quarterly unemployment in Canada in 1962-1973

Figure 6: Decomposed quarterly unemployment in Canada in 1962-1973

However, discerning longer-term cyclical fluctuations in Canada’s unemployment due to broader economic conditions within the limited time frame of the plot (spanning from 1962 to 1973) may be challenging. Extended data covering a more prolonged period would be necessary to reliably identify and confirm these broader economic trends and their impact on unemployment.

Based on the above, any forecasts of this series would need to capture the trend and seasonal patterns.

Detecting outliers in time series data in Figure 7 is critical as they can significantly impact analysis and forecasts.

# Ensure the time series has the correct frequency set
training_ts <- ts(training_data, frequency = 4)

# Convert the time series to a data frame for plotting
# Create a factor indicating the quarter
df <- data.frame(
  Value = as.numeric(training_ts),
  Quarter = factor(rep(1:4, length.out = length(training_ts)))
)

ggplot(df, aes(x = Quarter, y = Value)) +
  geom_boxplot() +
  labs(title = "Distribution of unemployment in Canada in 1962-1973", x = "Quarter", y = "Unemployment")+
    theme_tufte(base_family = "Helvetica") +                 
    theme(
      plot.title = element_text(hjust = 0.5),                 
      legend.position = "top"                                 
    )
Distribution of unemployment in Canada in 1962-1973 per quarter

Figure 7: Distribution of unemployment in Canada in 1962-1973 per quarter

By generating boxplots for each quarter to detect outliers within each period, we conclude there are no outliers that could affect forecasting.

Regression modelling, analysis and forecasting

Considering the presence of changing trends and seasonality in the historical data, a regression model with trend and seasonal components as predictors seems appropriate. However, to adjust the model to capture the parabolic shape of the time series, we add quadratic and cubic trend components to capture any curvature or parabolic shape in the training time series data. We will fit a polynomial regression model using the tslm() function in R, specifically designed for time series data.

# Regression Model
regression_model <- tslm(training_data ~ trend + I(trend^2) + I(trend^3)+ season)

{ cat("Regression model summary\n"); print(summary(regression_model)) }
Regression model summary

Call:
tslm(formula = training_data ~ trend + I(trend^2) + I(trend^3) + 
    season)

Residuals:
    Min      1Q  Median      3Q     Max 
-751.41 -261.72   48.11  231.77  732.82 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.113e+03  2.407e+02  25.394  < 2e-16 ***
trend       -3.551e+02  4.045e+01  -8.779 5.80e-11 ***
I(trend^2)   1.651e+01  1.908e+00   8.653 8.57e-11 ***
I(trend^3)  -1.863e-01  2.561e-02  -7.274 6.78e-09 ***
season2     -9.863e+02  1.500e+02  -6.576 6.56e-08 ***
season3     -1.810e+03  1.505e+02 -12.026 5.00e-15 ***
season4     -1.544e+03  1.513e+02 -10.204 8.08e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 366.9 on 41 degrees of freedom
Multiple R-squared:  0.9245,    Adjusted R-squared:  0.9135 
F-statistic: 83.68 on 6 and 41 DF,  p-value: < 2.2e-16

The model demonstrates a good fit to the training data in Figure 8, explaining approximately 92.45% of the variance, and is statistically significant based on the F-statistic. The coefficients for the trend terms and seasonal components are statistically significant at conventional levels, indicating their significant effects on the response variable. Specifically, there is an average upward trend of 54.20 unemployed units per quarter. On average, the second quarter experienced unemployment at approximately 986.3 units lower than the first quarter, the third quarter was 1810 units lower, and the fourth quarter was 1544 units lower.

autoplot(training_data, series="Data") +
  autolayer(fitted(regression_model), series="Fitted") +
  xlab("Year") + ylab("Unemployment") +
  ggtitle("Regression model: historical vs fitted values") +
  guides(color = guide_legend(title = "")) +
    theme_tufte(base_family = "Helvetica") +                 
    theme(
      plot.title = element_text(hjust = 0.5),                 
      legend.position = "top"                                 
    )
Regression model: quarterly unemployment in Canada in 1962-1973 vs. fitted values

Figure 8: Regression model: quarterly unemployment in Canada in 1962-1973 vs. fitted values

The residual standard error is relatively low (366.9), suggesting that the model provides a satisfactory fit to the data. Based on the Shapiro-Wilk test, there is no sufficient evidence to conclude that the residuals significantly depart from a normal distribution, indicating that the assumption of normality may be reasonable.

# Test for normality
kableExtra::kbl(glance(shapiro.test(regression_model$residuals)), digits = 2, caption = "Shapiro-Wilk test of the regression model's training residuals")|>
  kable_styling(
    position = "center",
    full_width = FALSE
  )
Table 2: Shapiro-Wilk test of the regression model’s training residuals
statistic p.value method
0.98 0.79 Shapiro-Wilk normality test

The residuals appear to be normally distributed, and there are no apparent patterns in the residual plots in Figure 9.

# Plot histogram of residuals
checkresiduals(regression_model) 
Training residuals from the regression model

Figure 9: Training residuals from the regression model


    Breusch-Godfrey test for serial correlation of order up to 10

data:  Residuals from Linear regression model
LM test = 27.31, df = 10, p-value = 0.002326

However, the Breusch-Godfrey test, indicating evidence of serial correlation in the residuals from the linear regression model, and the residuals’ Autocorrelation Function plot reveals significant autocorrelation at lag 1, suggesting a violation of the assumption of independence.

Subsequently, the Ljung-Box test confirms significant autocorrelation in the training residuals at lag 1, indicating that a time series model explicitly addressing autocorrelation, such as an ARIMA model, may offer a better fit.

# Perform Ljung-Box test on the residuals
ljung_box_test <- Box.test(regression_model$residuals, lag = 1, type = "Ljung-Box")

kableExtra::kbl(glance((ljung_box_test)), digits = 2, caption = "Box-Ljung test of the regression model's training residuals")|>
  kable_styling(
    position = "center",
    full_width = FALSE
  )
Table 3: Box-Ljung test of the regression model’s training residuals
statistic p.value parameter method
20.59 0 1 Box-Ljung test

When evaluating the regression model’s forecasting performance on future data displayed as a dashed black line in Figure 10, we notice that it predicts unemployment reasonably well for the first four quarters, with the future data lying within the 80% confidence interval. However, future unemployment sharply increased during the last four quarters, contrary to the regression model’s forecast, which exhibits a clear downward trend. This discrepancy suggests that the model may not capture specific patterns or information present in the data.

# Forecast
forecast_regression <- forecast(regression_model, h = 8, PI = TRUE, level = c(0.8, 0.95))

# Plot the forecast and test data together
autoplot(training_data) +
  autolayer(forecast(regression_model, level = c(0.95)), series = "95% CI") +
      autolayer(forecast(regression_model, level = c(0.8)), series = "80% CI") +
  autolayer(test_data, color = "black", linetype = "dashed", series = "Future Data") +
  xlab("Year") +
  ylab("Unemployment") +
  ggtitle("Regression model: forecasts of unemployment in Canada") +
  guides(color = guide_legend(title = "Forecasts"),
         linetype = guide_legend(title = "Future Data")) +
    theme_tufte(base_family = "Helvetica") +                 
    theme(
      plot.title = element_text(hjust = 0.5),                 
      legend.position = "top"                                 
    )
Regression model's eight-quarter forecast of unemployment in Canada

Figure 10: Regression model’s eight-quarter forecast of unemployment in Canada

The Ljung-Box test suggests that there may be some remaining patterns or information in the out-of-sample residuals that the model has not captured, as the p-value (0.07898) is greater than the conventional significance level of 0.05.

# Calculate residuals
residuals_regression <- forecast_regression$mean - test_data
checkresiduals(residuals_regression)
Regression model: out-of-sample residuals

Figure 11: Regression model: out-of-sample residuals


    Ljung-Box test

data:  Residuals
Q* = 6.7876, df = 3, p-value = 0.07898

Model df: 0.   Total lags used: 3

We can use various evaluation metrics, such as MAE, RMSE, and MAPE, to evaluate the accuracy of the regression model’s forecasts of future data.

# Calculate evaluation metrics
mae_regression <- mean(abs(residuals_regression))
rmse_regression <- sqrt(mean(residuals_regression^2))
mape_regression <- mean(abs(residuals_regression / test_data)) * 100
bias_regression <- mean(residuals_regression)

# Create a data frame for the metrics
evaluation_regression <- data.frame(
  Metric = c("Mean Absolute Error (MAE)", "Root Mean Squared Error (RMSE)", 
             "Mean Absolute Percentage Error (MAPE)", "Forecast Bias"),
  Value = c(mae_regression, rmse_regression, mape_regression, bias_regression)
)

# Remove column names
colnames(evaluation_regression) <- NULL

# Print the data frame

kableExtra::kbl(evaluation_regression, digits = 2, caption = "Error measures evaluating the regression model's out-of-sample accuracy")|>
  kable_styling(
    position = "center",
    full_width = FALSE
  )
Table 4: Error measures evaluating the regression model’s out-of-sample accuracy
Mean Absolute Error (MAE) 1388.75
Root Mean Squared Error (RMSE) 1812.55
Mean Absolute Percentage Error (MAPE) 20.51
Forecast Bias -1304.33

An RMSE of 1812.552 indicates that, on average, the forecasted values deviate from the actual future values by approximately 1812.552 units. The MAPE suggests that, on average, the forecasts deviate from the actual future values by approximately 20.50523%. The bias indicates that, on average, the forecasts tend to underestimate the actual values by approximately 1304.327 units.

Overall, the forecasting performance of this model seems to have moderate accuracy. The MAE, RMSE, and MAPE values suggest that the forecasts have a reasonable level of accuracy, although there is room for improvement. Additionally, the negative bias indicates a tendency for the forecasts to be consistently lower than the actual values.

Notably, the significant jump in unemployment observed in 1975, the highest value throughout the observation period, indicates an extraordinary event that the model could not consider. This highlights the challenge of forecasting, which assumes that future patterns and behaviours will resemble those observed in the past.

Exponential smoothing modelling, analysis and forecasting

To manually fit a model from the exponential smoothing family, tailored for time series data exhibiting trend and seasonality, the Holt-Winters method is a suitable choice. This method extends simple exponential smoothing to accommodate data with both trend and seasonality. We will fit the Holt-Winters model incorporating multiplicative error, additive trend with damping, and multiplicative seasonality components (ETS(M,Ad,M)).

# Fit the Holt-Winters model with multiplicative trend and seasonality
hw_model <- ets(training_data, model="MAM", damped=TRUE)

{ cat("Exponential Smoothing model with the Holt-Winters method summary\n"); print(summary(hw_model)) }
Exponential Smoothing model with the Holt-Winters method summary
ETS(M,Ad,M) 

Call:
ets(y = training_data, model = "MAM", damped = TRUE)

  Smoothing parameters:
    alpha = 0.4065 
    beta  = 0.4065 
    gamma = 0.5796 
    phi   = 0.8758 

  Initial states:
    l = 4134.9175 
    b = -5.4187 
    s = 0.881 0.7053 0.9446 1.4691

  sigma:  0.0855

     AIC     AICc      BIC 
752.0501 757.9960 770.7621 

Training set error measures:
                    ME     RMSE      MAE        MPE     MAPE      MASE
Training set -28.89258 321.5432 244.6389 -0.1225624 5.976051 0.5334049
                  ACF1
Training set 0.3023249

A sigma value of 0.0855 suggests that the model captures a significant portion of the variability in the training data. Additionally, the relatively low values of AIC, AICc, and BIC are positive indicators of model fit.

autoplot(training_data, series="Data") +
  autolayer(fitted(hw_model), series="Fitted") +
  xlab("Year") + ylab("Unemployment") +
    ggtitle("ETS (M, Ad, M) model: historical vs fitted values") +
  guides(color = guide_legend(title = "")) +
    theme_tufte(base_family = "Helvetica") +                 
    theme(
      plot.title = element_text(hjust = 0.5),                 
      legend.position = "top"                                 
    )
Exponential smoothing (ETS (M, Ad, M)) model: quarterly unemployment in Canada in 1962-1973 vs. fitted values

Figure 12: Exponential smoothing (ETS (M, Ad, M)) model: quarterly unemployment in Canada in 1962-1973 vs. fitted values

Upon examination of residuals, they appear to be normally distributed without any discernible patterns.

# Plot histogram of residuals
checkresiduals(hw_model) 
Exponential smoothing (ETS (M, Ad, M)) model: training residuals

Figure 13: Exponential smoothing (ETS (M, Ad, M)) model: training residuals


    Ljung-Box test

data:  Residuals from ETS(M,Ad,M)
Q* = 14.359, df = 8, p-value = 0.07287

Model df: 0.   Total lags used: 8

It is supported by the Shapiro-Wilk test results, which do not provide sufficient evidence to reject the assumption of normality.

# Test for normality

kableExtra::kbl(glance(shapiro.test(hw_model$residuals)), digits = 2, caption = "Shapiro-Wilk test of the exponential smoothing model's training residuals")|>
  kable_styling(
    position = "center",
    full_width = FALSE
  )
Table 5: Shapiro-Wilk test of the exponential smoothing model’s training residuals
statistic p.value method
0.97 0.39 Shapiro-Wilk normality test

However, further evaluation of the model on future data is warranted to provide a comprehensive assessment.

When evaluating the forecasting accuracy of the exponential smoothing model with the Holt-Winters method against future data in Figure 14, we observe that while the future data aligns with the 95% confidence interval for the first four quarters, it surpasses the forecasted range thereafter.

# Forecast
forecast_hw <- forecast(hw_model, h = 8, PI = TRUE, level = c(0.8, 0.95))

# Plot the forecast and test data together
autoplot(training_data) +
  autolayer(forecast(hw_model, level = c(0.95)), series = "95% CI") +
      autolayer(forecast(hw_model, level = c(0.8)), series = "80% CI") +
  autolayer(test_data, color = "black", linetype = "dashed", series = "Future Data") +
  xlab("Year") +
  ylab("Unemployment") +
  ggtitle("ETS(M,Ad,M) model: forecasts of unemployment in Canada") +
  guides(color = guide_legend(title = "Forecasts"),
         linetype = guide_legend(title = "Future Data"))+
    theme_tufte(base_family = "Helvetica") +                 
    theme(
      plot.title = element_text(hjust = 0.5),                 
      legend.position = "top"                                 
    )
Exponential smoothing (ETS (M, Ad, M)) model's eight-quarter forecast of unemployment in Canada

Figure 14: Exponential smoothing (ETS (M, Ad, M)) model’s eight-quarter forecast of unemployment in Canada

This discrepancy makes it challenging to directly assess the model’s forecasting performance solely based on this plot. Therefore, we conduct residual analysis in Figure 15 to obtain a more objective evaluation and examine various evaluation metrics such as MAE, RMSE, and MAPE.

# Calculate residuals
residuals_hw <- forecast_hw$mean - test_data
checkresiduals(residuals_hw) 
Exponential smoothing (ETS (M, Ad, M)) model: out-of-sample residuals

Figure 15: Exponential smoothing (ETS (M, Ad, M)) model: out-of-sample residuals


    Ljung-Box test

data:  Residuals
Q* = 4.8876, df = 3, p-value = 0.1802

Model df: 0.   Total lags used: 3

The Ljung-Box Test Statistic’s p-value (0.1802) is greater than the commonly used significance level of 0.05, suggesting no significant autocorrelation in the residuals at the specified lags.

However, the model exhibits poor performance compared to the future data.

# Calculate evaluation metrics
mae_hw <- mean(abs(residuals_hw))
rmse_hw <- sqrt(mean(residuals_hw^2))
mape_hw <- mean(abs(residuals_hw / test_data)) * 100
bias_hw <- mean(residuals_hw)

# Create a data frame for the metrics
evaluation_hw <- data.frame(
  Metric = c("Mean Absolute Error (MAE)", "Root Mean Squared Error (RMSE)", 
             "Mean Absolute Percentage Error (MAPE)", "Forecast Bias"),
  Value = c(mae_hw, rmse_hw, mape_hw, bias_hw)
)

# Remove column names
colnames(evaluation_hw) <- NULL

# Print the data frame

kableExtra::kbl(evaluation_hw, digits = 2, caption = "Error measures evaluating ETS(M,Ad,M) model's out-of-sample accuracy")|>
  kable_styling(
    position = "center",
    full_width = FALSE
  )
Table 6: Error measures evaluating ETS(M,Ad,M) model’s out-of-sample accuracy
Mean Absolute Error (MAE) 1975.52
Root Mean Squared Error (RMSE) 2292.64
Mean Absolute Percentage Error (MAPE) 30.04
Forecast Bias -1975.52

High MAE and RMSE values indicate significant deviations from actual values on average, while a MAPE of 30.0379% suggests considerable discrepancies relative to actual values. A lower MAPE is desired for more accurate forecasts. Furthermore, the negative bias indicates a systematic underestimation of actual values.

In conclusion, the model’s performance is subpar, characterised by high errors, substantial deviations from actual future values, and systematic bias. Further refinement or alternative modelling approaches may be necessary to enhance forecasting accuracy.

ARIMA modelling, analysis and forecasting

As established earlier in this report, the historical time series data exhibits trend and seasonality components, rendering it non-stationary. This is confirmed by the Autocorrelation Function (ACF) plot in Figure 16, which displays multiple spikes outside the confidence intervals, with a notably strong and significantly positive first lag.

acf(training_data, main = "ACF: unemployment in Canada in 1962-1973")
Autocorrelation Function (ACF) plot of the quarterly unemployment in Canada in 1962-1973

Figure 16: Autocorrelation Function (ACF) plot of the quarterly unemployment in Canada in 1962-1973

To address non-stationarity, differencing can stabilise the time series’ mean and eliminate trends and seasonality. Augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests are conducted to determine the necessity of differencing objectively.

library(tseries)
print(adf.test(training_data, alternative = "stationary")) 

    Augmented Dickey-Fuller Test

data:  training_data
Dickey-Fuller = -1.9397, Lag order = 3, p-value = 0.5982
alternative hypothesis: stationary

The ADF test yields a high p-value (0.5982), failing to reject the null hypothesis of non-stationarity, while the KPSS test, with a p-value of 0.01, rejects the null hypothesis of stationarity around a level:

library(urca)
print(kpss.test(training_data))

    KPSS Test for Level Stationarity

data:  training_data
KPSS Level = 0.80281, Truncation lag parameter = 3, p-value = 0.01

As the next step, we can use ndiffs() and nsdiffs() functions to determine the appropriate number of first and seasonal differencing for the training data:

{cat("Number of first differencings: ", ndiffs(training_data), "\n"); cat("Number of seasonal differencings: ", nsdiffs(training_data))}
Number of first differencings:  1 
Number of seasonal differencings:  1

A Box-Cox transformation will not be performed due to the absence of evidence indicating variance changes.

Subsequently, the first and seasonal differencings are applied to the historical data, followed by an examination of the ACF/PACF plots in Figure 17.

training_diff <- diff(diff(training_data, differences = 1), lag = 4)

# Set up a multi-panel plot with 1 row and 2 columns
par(mfrow = c(1, 2))

# Plot ACF for differenced data
acf(training_diff, main = "Differenced unemployment")

# Plot PACF for differenced data
pacf(training_diff, main = "")
Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots of the quarterly unemployment in Canada in 1962-1973

Figure 17: Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots of the quarterly unemployment in Canada in 1962-1973

The absence of significant spikes in the ACF plot beyond lag 0 suggests no need for autoregressive (AR) terms, while the significant spike at lag 1 in the PACF plot suggests a potential need for a Moving Average (MA) component.

The efficacy of differencing is further confirmed by plotting the differenced historical data in Figure 18 together with performing the ADF test below, indicating stationarity of the differenced series, while the KPSS test fails to reject the null hypothesis of stationarity.

print(adf.test(training_diff, alternative = "stationary"))

    Augmented Dickey-Fuller Test

data:  training_diff
Dickey-Fuller = -4.2479, Lag order = 3, p-value = 0.01
alternative hypothesis: stationary
print(kpss.test(training_diff))

    KPSS Test for Level Stationarity

data:  training_diff
KPSS Level = 0.11986, Truncation lag parameter = 3, p-value = 0.1
# Plot Differenced Data
autoplot(training_diff)+
 ggtitle("Differenced unemployment in Canada in 1962-1973") +
    xlab("Year")+
    ylab("Differenced unemployment")+
  guides(color = guide_legend(title = "")) +
  scale_fill_manual(values = c("lightblue", "skyblue"), labels = c("Yes", "No")) +
  theme_tufte(base_family = "Helvetica") +
  theme(
    plot.title = element_text(hjust = 0.5),
    legend.position = "top"
    )
Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots of the differenced training data

Figure 18: Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots of the differenced training data

With stationarity achieved, modeling using ARIMA (AutoRegressive Integrated Moving Average) methods becomes feasible. Based on observations, an initial ARIMA model of ARIMA(0,1,0)(1,1,0)[4] is proposed, accounting for differencing and seasonality.

arima_model <- Arima(training_data, order = c(0,1,0), seasonal = c(1,1,0))
print(summary(arima_model))
Series: training_data 
ARIMA(0,1,0)(1,1,0)[4] 

Coefficients:
         sar1
      -0.2806
s.e.   0.1504

sigma^2 = 97474:  log likelihood = -307.65
AIC=619.3   AICc=619.6   BIC=622.82

Training set error measures:
                    ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
Training set -5.924679 292.0447 219.2004 0.2442629 5.699038 0.4779394 0.1445496

These statistical measures provide a way to compare the goodness of fit among different models. In this case, the log-likelihood is -307.65, and the AIC, AICc, and BIC are 619.3, 619.6, and 622.82, respectively. Lower values of AIC, AICc, and BIC indicate a better fit, suggesting that the model is relatively good compared to alternative models.

The seasonal autoregressive term (sar1) coefficient is -0.2806, indicating a negative relationship between the observations and their seasonal lagged values. This coefficient’s standard error (s.e.) is 0.1504, suggesting a relatively precise estimate.

The MPE (Mean Percentage Error) is 0.2442629%, which measures the average relative error. It’s close to 0, indicating that, on average, the model’s forecasts are accurate.

autoplot(training_data, series="Data") +
  autolayer(fitted(arima_model), series="Fitted") +
  xlab("Year") + ylab("Unemployment") +
 ggtitle("ARIMA(0,1,0)(1,1,0)[4] model: historical vs fitted values") +
  guides(color = guide_legend(title = "")) +
  scale_fill_manual(values = c("lightblue", "skyblue"), labels = c("Yes", "No")) +
  theme_tufte(base_family = "Helvetica") +
  theme(
    plot.title = element_text(hjust = 0.5),
    legend.position = "top"
    )
ARIMA(0,1,0)(1,1,0)[4] model: quarterly unemployment in Canada in 1962-1973 vs fitted values

Figure 19: ARIMA(0,1,0)(1,1,0)[4] model: quarterly unemployment in Canada in 1962-1973 vs fitted values

Residual analysis in Figure 20 reveals normally distributed residuals with no significant autocorrelation, further affirming the adequacy of the ARIMA model.

# Plot histogram of residuals
checkresiduals(arima_model)
ARIMA(0,1,0)(1,1,0)[4] model: training residuals

Figure 20: ARIMA(0,1,0)(1,1,0)[4] model: training residuals


    Ljung-Box test

data:  Residuals from ARIMA(0,1,0)(1,1,0)[4]
Q* = 5.5894, df = 7, p-value = 0.5884

Model df: 1.   Total lags used: 8

The Ljung-Box test result, with a p-value of 0.5884, indicates that the model adequately captures the temporal dependence structure present in the historical data.

The Shapiro-Wilk test, which assesses the normality of the residuals from the ARIMA model, does not provide sufficient evidence to reject the null hypothesis of normality. This suggests that the residuals from the ARIMA model are approximately normally distributed.

# Test for normality
print(shapiro.test(arima_model$residuals))

    Shapiro-Wilk normality test

data:  arima_model$residuals
W = 0.98638, p-value = 0.8451

Upon forecasting performance evaluation against future data, the ARIMA model exhibits reasonably low errors and bias, indicating satisfactory performance, albeit challenges in predicting extraordinary spikes in future data in Figure 21.

# Forecast
forecast_arima <- forecast(arima_model, h = 8, PI = TRUE, level = c(0.8, 0.95))

# Plot the forecast and test data together
autoplot(training_data) +
  autolayer(forecast(arima_model, level = c(0.95)), series = "95% CI") +
      autolayer(forecast(arima_model, level = c(0.8)), series = "80% CI") +
  autolayer(test_data, color = "black", linetype = "dashed", series = "Future Data") +
  xlab("Year") +
  ylab("Unemployment") +
  ggtitle("ARIMA(0,1,0)(1,1,0)[4] model: forecasts of unemployment in Canada") +
  guides(color = guide_legend(title = "Forecasts"),
         linetype = guide_legend(title = "Future Data"))+
  scale_fill_manual(values = c("lightblue", "skyblue"), labels = c("Yes", "No")) +
  theme_tufte(base_family = "Helvetica") +
  theme(
    plot.title = element_text(hjust = 0.5),
    legend.position = "top"
    )
ARIMA(0,1,0)(1,1,0)[4] model's eight-quarter forecast of unemployment in Canada

Figure 21: ARIMA(0,1,0)(1,1,0)[4] model’s eight-quarter forecast of unemployment in Canada

The Ljung-Box Test Statistic’s p-value (0.09731) rejects the null hypothesis of no autocorrelation in the residuals, suggesting no significant autocorrelation present in the residuals at the specified lags.

# Calculate residuals
residuals_arima <- forecast_arima$mean - test_data

checkresiduals(residuals_arima)
ARIMA(0,1,0)(1,1,0)[4] model: out-of-sample residuals

Figure 22: ARIMA(0,1,0)(1,1,0)[4] model: out-of-sample residuals


    Ljung-Box test

data:  Residuals
Q* = 6.3136, df = 3, p-value = 0.09731

Model df: 0.   Total lags used: 3

The error measures suggest that the model’s forecasts have relatively low errors and bias, indicating reasonably good performance, considering the future data had extraordinary spikes.

# Calculate evaluation metrics
mae_arima <- mean(abs(residuals_arima))
rmse_arima <- sqrt(mean(residuals_arima^2))
mape_arima <- mean(abs(residuals_arima / test_data)) * 100
bias_arima <- mean(residuals_arima)

# Create a data frame for the metrics
evaluation_arima <- data.frame(
  Metric = c("Mean Absolute Error (MAE)", "Root Mean Squared Error (RMSE)", 
             "Mean Absolute Percentage Error (MAPE)", "Forecast Bias"),
  Value = c(mae_arima, rmse_arima, mape_arima, bias_arima)
)

# Remove column names
colnames(evaluation_arima) <- NULL

# Print the data frame

kableExtra::kbl(evaluation_arima, digits = 2, caption = "Error measures evaluating the ARIMA model’s out-of-sample accuracy")|>
  kable_styling(
    position = "center",
    full_width = FALSE
  )
Table 7: Error measures evaluating the ARIMA model’s out-of-sample accuracy
Mean Absolute Error (MAE) 1533.37
Root Mean Squared Error (RMSE) 1899.72
Mean Absolute Percentage Error (MAPE) 23.03
Forecast Bias -1533.37

Comparative analysis of Mean Absolute Percentage Errors (MAPEs) among the three forecasting models helps identify the most suitable model. It is a reasonable approach, especially if the models have different structures or complexities, such as the regression model, which includes polynomial terms. MAPE is a relative error metric that accounts for the magnitude of the forecasted values, which can be helpful when comparing models with different scales:

# Store evaluation metrics for each model in a data frame
evaluation_metrics <- data.frame(
  Model = c("ARIMA", "Exponential Smoothing", "Regression"),
  MAE = c(mae_arima, mae_hw, mae_regression),
  RMSE = c(rmse_arima, rmse_hw, rmse_regression),
  MAPE = c(mape_arima, mape_hw, mape_regression),
  Bias = c(bias_arima, bias_hw, bias_regression)
)

# Print the evaluation metrics for comparison

kableExtra::kbl(evaluation_metrics, digits = 2, caption = "Error measures evaluating out-of-sample accuracy of the three models")|>
  kable_styling(
    position = "center",
    full_width = FALSE
  )
Table 8: Error measures evaluating out-of-sample accuracy of the three models
Model MAE RMSE MAPE Bias
ARIMA 1533.37 1899.72 23.03 -1533.37
Exponential Smoothing 1975.52 2292.64 30.04 -1975.52
Regression 1388.75 1812.55 20.51 -1304.33
# Select the model with the lowest values for the evaluation metrics
best_model <- evaluation_metrics[which.min(evaluation_metrics$MAPE), ]

# Print the best model
cat("Best Model based on MAPE:", best_model$Model, "\n")
Best Model based on MAPE: Regression 

However, it’s crucial to acknowledge that the three forecasting models struggled to forecast unemployment in Canada after 1975 due to the complex and multifaceted nature of the economic conditions during that period. The global economic recession, triggered by the oil crisis of 1973-1974, resulted in stagnant growth and rising unemployment rates worldwide (Bank of Canada, 1999). Industrial restructuring, trade disruptions, and technological advancements further exacerbated job losses across various sectors. Government policies aimed at curbing inflation may have inadvertently worsened unemployment levels. The interplay of these factors created a highly volatile and uncertain economic environment, making it challenging for forecasting models to capture and predict unemployment dynamics during that time accurately.

Batch Forecasting

Exploratory analysis

In this section of the report, we undertake batch forecasting on a subset of quarterly time series data from the M3 competition, specifically focusing on IDs 1001 to 1100. It comprises historical data utilised for fitting automatic forecasting models and future data used to evaluate the forecasting performance of the fitted models. Each time series represents a historical record of a specific variable, such as sales numbers, stock prices, or production levels. For instance, the time series with ID 1001 tracks the volume indices of exports (both goods and services) from Japan.

{cat("M3 competition series ID 1001\n"); print(M3[[1001]])}
M3 competition series ID 1001
Series: Q356
Type of series: MACRO
Period of series: QUARTERLY
Series description: OECD ECONOMIC OUTLOOK - JAPAN Export(Goods-Services)- Volume Indeces

HISTORICAL data
       Qtr1   Qtr2   Qtr3   Qtr4
1980 3311.5 3368.0 3363.0 3504.0
1981 3609.5 3848.0 4018.5 3980.5
1982 4016.0 4009.0 4053.0 4002.5
1983 3928.0 3983.0 4140.5 4305.0
1984 4481.5 4630.5 4739.0 4934.0
1985 4923.5 5135.5 4987.0 4951.5
1986 4692.0 4793.5 4681.0 4774.5
1987 4791.5 4839.0 5047.5 5140.0
1988 5198.0 5257.0 5713.0 5772.5
1989 5965.0 6168.0 6473.0 6638.0
1990 6928.5 7034.0 6874.5 7055.0

FUTURE data
       Qtr1   Qtr2   Qtr3   Qtr4
1991 7246.5 7146.0 7357.0 7469.0
1992 7507.5 7380.0 7449.5 7587.5

The primary objective of this section of the report is to select, automatically fit, and evaluate three distinct statistical models (ARIMA, ETS, and TBATS) for forecasting quarterly time series of IDs from 1001 to 1100 from the M3 competition data set.

Error measures selection

Following the generation of forecasts for the next eight quarters, we will assess their accuracy using two appropriate error measures:

Mean Absolute Percentage Error (MAPE) measures the percentage difference between the actual future and forecasted values, offering a relative measure of forecast accuracy. MAPE represents the average percentage error of the forecast relative to the future data. Its simplicity makes it easy to understand and communicate to stakeholders.

Symmetric Mean Absolute Percentage Error (sMAPE) measures the percentage difference between future and forecasted values in a symmetric manner, meaning it does not favour overestimation or underestimation. It calculates the absolute percentage error for each observation and then averages these errors across all observations. sMAPE is scale-independent, allowing for comparison of forecast accuracy across different data sets and variables. Its symmetry treats positive and negative errors equally, which can be advantageous in various forecasting scenarios.

While MAPE offers simplicity and straightforwardness, sMAPE provides additional robustness and symmetry, rendering it suitable for a broader range of forecasting scenarios. Both measures are valuable for comparing forecast accuracy across different time series with varying scales, aiding in comprehensive evaluation and decision-making processes.

Benchmarking

In addition to evaluating each model’s accuracy using MAPE and sMAPE, we will compare their performance against two benchmark methods:

Theta model considers both the level and trend of the time series data. Theta forecasting provides a straightforward baseline for comparison, capturing basic trends in the data without incorporating more complex patterns. One advantage of Theta forecasting is its simplicity and ease of implementation, making it suitable for quick assessments of forecasting performance. According to Makridakis and Hibon (2000), despite its apparent simplicity and lack of reliance on robust statistical foundations, the Theta method exhibits remarkable accuracy across various series types, forecasting timeframes, and evaluation metrics. However, it may overlook more subtle patterns or seasonality in the data, limiting its accuracy compared to more sophisticated models.

Damped Exponential Smoothing is a variation of exponential smoothing that diminishes the influence of past observations as the forecast horizon extends. This method accounts for the decreasing relevance of historical data as time progresses, offering a more nuanced approach than the theta model’s simplicity. Damped exponential smoothing is advantageous as it adapts to changing data patterns over time, providing smoother and often more accurate forecasts. However, it may struggle with abrupt changes or outliers in the data, potentially leading to inaccuracies in forecasting during periods of volatility.

By comparing the performance of the models against these benchmark methods, we can gain insights into their relative effectiveness and assess their ability to outperform basic forecasting approaches. This comparative analysis will aid in identifying the strengths and weaknesses of each model, guiding decision-making processes and informing future forecasting strategies.

Automatic ARIMA model

The auto.arima() function in R automatically selects the optimal ARIMA parameters for each time series data (IDs 1001 to 1100) from the M3 competition data set based on the corrected Akaike Information Criterion (AICc). However, it may not effectively capture complex seasonal patterns or structural changes in the data, leading to suboptimal forecasts in certain cases. The automatic ARIMA model may also struggle with noisy or irregular data, requiring additional preprocessing or model refinement to improve forecasting accuracy.

# Define the series IDs and criterion
ts_start <- 1001
ts_end <- 1100
criterion <- "aicc"
num_ts <- ts_end - ts_start + 1

# Initialise arrays to store MAPE and sMAPE for ARIMA and benchmarks
mape_arima <- numeric(num_ts)
mape_theta <- numeric(num_ts)
mape_damped <- numeric(num_ts)
smape_arima <- numeric(num_ts)
smape_theta <- numeric(num_ts)
smape_damped <- numeric(num_ts)

# Loop through each time series
for (s in ts_start:ts_end) {
  train_data <- M3[[s]]$x
  test_data <- M3[[s]]$xx
  h <- length(test_data)

  # Fit ARIMA model
  arima_fit <- auto.arima(train_data, ic = criterion)

  arima_fcst <- forecast(arima_fit, h = h)$mean

  # Calculate MAPE for ARIMA
  mape_arima[s - ts_start + 1] <- 100 * mean(abs(test_data - arima_fcst) / test_data, na.rm = TRUE)
  # Calculate sMAPE for ARIMA
  smape_arima[s - ts_start + 1] <- 200 * mean(abs(test_data - arima_fcst) / (abs(test_data) + abs(arima_fcst)), na.rm = TRUE)

  # Fit Theta model
  theta_fit <- thetaf(train_data, h = h)
  theta_fcst <- forecast(theta_fit)$mean
  # Calculate MAPE for Theta
  mape_theta[s - ts_start + 1] <- 100 * mean(abs(test_data - theta_fcst) / test_data, na.rm = TRUE)
  # Calculate sMAPE for Theta
  smape_theta[s - ts_start + 1] <- 200 * mean(abs(test_data - theta_fcst) / (abs(test_data) + abs(theta_fcst)), na.rm = TRUE)

  # Fit Damped Exponential Smoothing model
  tryCatch({
    damped_model <- ets(train_data, model = "ZZZ", damped = TRUE)
    damped_fcst <- forecast(damped_model, h = h)$mean
    # Calculate MAPE for Damped Exponential Smoothing
    mape_damped[s - ts_start + 1] <- 100 * mean(abs(test_data - damped_fcst) / test_data, na.rm = TRUE)
    # Calculate sMAPE for Damped Exponential Smoothing
    smape_damped[s - ts_start + 1] <- 200 * mean(abs(test_data - damped_fcst) / (abs(test_data) + abs(damped_fcst)), na.rm = TRUE)
  }, error = function(e) {
    mape_damped[s - ts_start + 1] <- NA  # Assign NA in case of error
    smape_damped[s - ts_start + 1] <- NA  # Assign NA in case of error
  })
}

After computing the average Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (sMAPE) for the ARIMA models, we compare its performance with the Theta and Damped Exponential Smoothing models, which serve as benchmarks. Individual error values are calculated for each method across multiple time series.

# Calculate average MAPE and sMAPE for each method
avg_mape_arima <- mean(mape_arima, na.rm = TRUE)
avg_smape_arima <- mean(smape_arima, na.rm = TRUE)
avg_mape_theta <- mean(mape_theta, na.rm = TRUE)
avg_smape_theta <- mean(smape_theta, na.rm = TRUE)
avg_mape_damped <- mean(mape_damped, na.rm = TRUE)
avg_smape_damped <- mean(smape_damped, na.rm = TRUE)

# Store evaluation metrics for each model in a data frame
arima_batch_evaluation_metrics <- data.frame(
  Model = c("ARIMA", "Theta", "Damped Exponential Smoothing"),
  MAPE = c(avg_mape_arima, avg_mape_theta, avg_mape_damped),
  sMAPE = c(avg_smape_arima, avg_smape_theta, avg_smape_damped)
)

# Print the evaluation metrics for comparison

kableExtra::kbl(arima_batch_evaluation_metrics, digits = 2, caption = "Error measures evaluating automatic ARIMA model's out-of-sample accuracy")|>
  kable_styling(
    position = "center",
    full_width = FALSE
  )
Table 9: Error measures evaluating automatic ARIMA model’s out-of-sample accuracy
Model MAPE sMAPE
ARIMA 5.68 5.37
Theta 5.54 5.29
Damped Exponential Smoothing 4.67 4.54
# Select the model with the lowest values for MAPE
arima_batch_best_model_mape <- arima_batch_evaluation_metrics[which.min(arima_batch_evaluation_metrics$MAPE), ]

# Select the model with the lowest values for sMAPE
arima_batch_best_model_smape <- arima_batch_evaluation_metrics[which.min(arima_batch_evaluation_metrics$sMAPE), ]

# Print the best model
{cat("Best model based on MAPE:", arima_batch_best_model_mape$Model, "\n"); cat("Best model based on sMAPE:", arima_batch_best_model_smape$Model, "\n")}
Best model based on MAPE: Damped Exponential Smoothing 
Best model based on sMAPE: Damped Exponential Smoothing 

The ARIMA model’s forecasts have an absolute percentage error of approximately 5.68%, with a symmetric perspective suggesting a slightly lower error of around 5.37%.

The Theta forecast model’s predictions demonstrate a lower absolute percentage error than the ARIMA model while showing a slightly lower error from a symmetric perspective.

Conversely, the Damped Exponential Smoothing model displays the lowest average MAPE of around 4.67% and the lowest average sMAPE of about 4.54%. These results indicate that the Damped Exponential Smoothing model offers the most accurate forecasts among the three methods, with both MAPE and sMAPE indicating lower error rates than the ARIMA and Theta forecast models.

Automatic Error-Trend-Seasonality (ETS) model

The ets() function in R automatically selects the optimal ETS (Error, Trend, Seasonality) model based on each series’ data characteristics, such as the presence of trend and seasonality. The fitted ETS model contains information about the estimated parameters, including the level, trend, and seasonal components for each series and any additional settings or options specified during the fitting process. However, it may be computationally intensive for large-scale forecasting tasks and could produce suboptimal results if the underlying data contains irregular patterns or outliers.

# Define the series IDs and criterion
ts_start <- 1001
ts_end <- 1100
criterion <- "aicc"
num_ts <- ts_end - ts_start + 1

# Initialise arrays to store MAPE and sMAPE for ETS and benchmarks
mape_ets <- numeric(num_ts)
smape_ets <- numeric(num_ts)
mape_theta <- numeric(num_ts)
smape_theta <- numeric(num_ts)
mape_damped <- numeric(num_ts)
smape_damped <- numeric(num_ts)

# Loop through each time series
for (s in ts_start:ts_end) {
  train_data <- M3[[s]]$x
  test_data <- M3[[s]]$xx
  h <- length(test_data)

  # Fit ETS model
  ets_fit <- ets(train_data)
  # Summary of the fitted ETS model

  ets_fcst <- forecast(ets_fit, h = h)$mean

  # Calculate MAPE for ETS
  mape_ets[s - ts_start + 1] <- 100 * mean(abs(test_data - ets_fcst) / test_data, na.rm = TRUE)
  # Calculate sMAPE for ETS
  smape_ets[s - ts_start + 1] <- 200 * mean(abs(test_data - ets_fcst) / (abs(test_data) + abs(ets_fcst)), na.rm = TRUE)

  # Fit Theta model
  theta_fit <- thetaf(train_data, h = h)
  theta_fcst <- forecast(theta_fit)$mean

  # Calculate MAPE for Theta
  mape_theta[s - ts_start + 1] <- 100 * mean(abs(test_data - theta_fcst) / test_data, na.rm = TRUE)
  # Calculate sMAPE for Theta
  smape_theta[s - ts_start + 1] <- 200 * mean(abs(test_data - theta_fcst) / (abs(test_data) + abs(theta_fcst)), na.rm = TRUE)

  # Fit Damped Exponential Smoothing model
  tryCatch({
    damped_model <- ets(train_data, model = "ZZZ", damped = TRUE)
    damped_fcst <- forecast(damped_model, h = h)$mean
    # Calculate MAPE for Damped Exponential Smoothing
    mape_damped[s - ts_start + 1] <- 100 * mean(abs(test_data - damped_fcst) / test_data, na.rm = TRUE)
    # Calculate sMAPE for Damped Exponential Smoothing
    smape_damped[s - ts_start + 1] <- 200 * mean(abs(test_data - damped_fcst) / (abs(test_data) + abs(damped_fcst)), na.rm = TRUE)
  }, error = function(e) {
    mape_damped[s - ts_start + 1] <- NA  # Assign NA in case of error
    smape_damped[s - ts_start + 1] <- NA  # Assign NA in case of error
  })
}

Similarly to the auto ARIMA model, we calculate the average Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (sMAPE) for the ETS forecasting models and compare them with the metrics for Theta and Damped Exponential Smoothing models as benchmarks.

# Calculate average MAPE and sMAPE for each method
avg_mape_ets <- mean(mape_ets, na.rm = TRUE)
avg_smape_ets <- mean(smape_ets, na.rm = TRUE)
avg_mape_theta <- mean(mape_theta, na.rm = TRUE)
avg_smape_theta <- mean(smape_theta, na.rm = TRUE)
avg_mape_damped <- mean(mape_damped, na.rm = TRUE)
avg_smape_damped <- mean(smape_damped, na.rm = TRUE)

# Store evaluation metrics for each model in a data frame
ets_batch_evaluation_metrics <- data.frame(
  Model = c("ETS", "Theta", "Damped Exponential Smoothing"),
  MAPE = c(avg_mape_ets, avg_mape_theta, avg_mape_damped),
  sMAPE = c(avg_smape_ets, avg_smape_theta, avg_smape_damped)
)

# Print the evaluation metrics for comparison

kableExtra::kbl(ets_batch_evaluation_metrics, digits = 2, caption = "Error measures evaluating automatic ETS model's out-of-sample accuracy")|>
  kable_styling(
    position = "center",
    full_width = FALSE
  )
Table 10: Error measures evaluating automatic ETS model’s out-of-sample accuracy
Model MAPE sMAPE
ETS 5.14 4.94
Theta 5.54 5.29
Damped Exponential Smoothing 4.67 4.54
# Select the model with the lowest values for MAPE
ets_batch_best_model_mape <- ets_batch_evaluation_metrics[which.min(ets_batch_evaluation_metrics$MAPE), ]

# Select the model with the lowest values for sMAPE
ets_batch_best_model_smape <- ets_batch_evaluation_metrics[which.min(ets_batch_evaluation_metrics$sMAPE), ]

# Print the best model
{cat("Best model based on MAPE:", ets_batch_best_model_mape$Model, "\n"); cat("Best model based on sMAPE:", ets_batch_best_model_smape$Model, "\n")}
Best model based on MAPE: Damped Exponential Smoothing 
Best model based on sMAPE: Damped Exponential Smoothing 

The ETS model exhibits a MAPE of 5.14%, indicating that, on average, its forecasts have a percentage error of approximately 5.14% compared to the future values. Additionally, it has an sMAPE of 4.94%, suggesting that, on average, its forecasts have a symmetric percentage error of approximately 4.94% compared to the future data.

Although the Damped Exponential Smoothing model slightly outperforms the ETS model, the latter still exhibits superior accuracy compared to the Theta model. Therefore, the ETS model remains a reliable choice for forecasting, especially considering its competitive performance. Factors such as computational simplicity or model interpretability may further support selecting the ETS model for forecasting tasks.

Automatic Trigonometric Seasonal Box-Cox Transformation, ARMA errors, Trend, and Seasonal components (TBATS) model

The TBATS model is renowned for its capability to handle multiple seasonalities, trends, and complex patterns in time series data (Brozyna, Mentel and Szetela, 2016). It is particularly useful for data sets with intricate seasonal patterns and irregular trends. It is preferred for automatic forecasting tasks where data may exhibit multiple seasonalities and nonlinear patterns. However, it may suffer from computational intensity, especially with large data sets, and might require tuning parameters for optimal performance in certain cases.

TBATS decomposes the time series into components, including multiple seasonalities and trends, utilising trigonometric functions and Box-Cox transformations. Subsequently, it fits an ARMA model to the residuals to capture any remaining temporal dependencies, providing forecasts based on the estimated components and model parameters.

# Define the series IDs and criterion
ts_start <- 1001
ts_end <- 1100
criterion <- "aicc"
num_ts <- ts_end - ts_start + 1

# Initialise arrays to store MAPE and sMAPE for TBATS and benchmarks
mape_tbats <- numeric(num_ts)
mape_theta <- numeric(num_ts)
mape_damped <- numeric(num_ts)
smape_tbats <- numeric(num_ts)
smape_theta <- numeric(num_ts)
smape_damped <- numeric(num_ts)

# Loop through each time series
for (s in ts_start:ts_end) {
  train_data <- M3[[s]]$x
  test_data <- M3[[s]]$xx
  h <- length(test_data)

  # Fit TBATS model
  tbats_fit <- tbats(train_data)
  tbats_fcst <- forecast(tbats_fit, h = h)$mean

  # Calculate MAPE for TBATS
  mape_tbats[s - ts_start + 1] <- 100 * mean(abs(test_data - tbats_fcst) / test_data, na.rm = TRUE)
  # Calculate sMAPE for TBATS
  smape_tbats[s - ts_start + 1] <- 200 * mean(abs(test_data - tbats_fcst) / (abs(test_data) + abs(tbats_fcst)), na.rm = TRUE)

  # Fit Theta model
  theta_fit <- thetaf(train_data)
  theta_fcst <- forecast(theta_fit, h = h)$mean

  # Calculate MAPE for Theta
  mape_theta[s - ts_start + 1] <- 100 * mean(abs(test_data - theta_fcst) / test_data, na.rm = TRUE)
  # Calculate sMAPE for Theta
  smape_theta[s - ts_start + 1] <- 200 * mean(abs(test_data - theta_fcst) / (abs(test_data) + abs(theta_fcst)), na.rm = TRUE)

  # Fit Damped Exponential Smoothing model
  tryCatch({
    damped_model <- ets(train_data, model = "ZZZ", damped = TRUE)
    damped_fcst <- forecast(damped_model, h = h)$mean
    # Calculate MAPE for Damped Exponential Smoothing
    mape_damped[s - ts_start + 1] <- 100 * mean(abs(test_data - damped_fcst) / test_data, na.rm = TRUE)
    # Calculate sMAPE for Damped Exponential Smoothing
    smape_damped[s - ts_start + 1] <- 200 * mean(abs(test_data - damped_fcst) / (abs(test_data) + abs(damped_fcst)), na.rm = TRUE)
  }, error = function(e) {
    mape_damped[s - ts_start + 1] <- NA  # Assign NA in case of error
    smape_damped[s - ts_start + 1] <- NA  # Assign NA in case of error
  })
}

Similar to the auto ARIMA and ETS models, we compute the average Mean Absolute Percentage Error (MAPE) and Symmetric Mean Absolute Percentage Error (sMAPE) for the TBATS forecasting models and compare them with the metrics for Theta and Damped Exponential Smoothing models as benchmarks.

# Calculate average MAPE and sMAPE for each method
avg_mape_tbats <- mean(mape_tbats, na.rm = TRUE)
avg_smape_tbats <- mean(smape_tbats, na.rm = TRUE)
avg_mape_theta <- mean(mape_theta, na.rm = TRUE)
avg_smape_theta <- mean(smape_theta, na.rm = TRUE)
avg_mape_damped <- mean(mape_damped, na.rm = TRUE)
avg_smape_damped <- mean(smape_damped, na.rm = TRUE)

# Store evaluation metrics for each model in a data frame
tbats_batch_evaluation_metrics <- data.frame(
  Model = c("TBATS", "Theta", "Damped Exponential Smoothing"),
  MAPE = c(avg_mape_tbats, avg_mape_theta, avg_mape_damped),
  sMAPE = c(avg_smape_tbats, avg_smape_theta, avg_smape_damped)
)

# Print the evaluation metrics for comparison

kableExtra::kbl(tbats_batch_evaluation_metrics, digits = 2, caption = "Error measures evaluating automatic TBATS model's out-of-sample accuracy")|>
  kable_styling(
    position = "center",
    full_width = FALSE
  )
Table 11: Error measures evaluating automatic TBATS model’s out-of-sample accuracy
Model MAPE sMAPE
TBATS 6.16 5.83
Theta 5.54 5.29
Damped Exponential Smoothing 4.67 4.54
# Select the model with the lowest values for MAPE
tbats_batch_best_model_mape <- tbats_batch_evaluation_metrics[which.min(tbats_batch_evaluation_metrics$MAPE), ]

# Select the model with the lowest values for sMAPE
tbats_batch_best_model_smape <- tbats_batch_evaluation_metrics[which.min(tbats_batch_evaluation_metrics$sMAPE), ]

# Print the best model
{cat("Best model based on MAPE:", tbats_batch_best_model_mape$Model, "\n"); cat("Best model based on sMAPE:", tbats_batch_best_model_smape$Model, "\n")}
Best model based on MAPE: Damped Exponential Smoothing 
Best model based on sMAPE: Damped Exponential Smoothing 

The TBATS model shows an average MAPE of about 6.16% and an sMAPE of around 5.83%. However, the Damped Exponential Smoothing and Theta models outperform it, with average MAPEs of 4.67% and 5.54% and sMAPEs of 4.54% and 5.29%, respectively.

To summarise the above, we output the average MAPE and sMAPE values for the automatic ARIMA, ETS, and TBATS models alongside their benchmark models (Theta and Damped Exponential Smoothing) to determine the best-performing model. The results are sorted in ascending order based on their combined performance in both MAPE and sMAPE.

# Store evaluation metrics for each method in a data frame
evaluation_metrics_summary <- data.frame(
  Method = c("ARIMA", "ETS", "TBATS", "Theta", "Damped Exponential Smoothing"),
  MAPE = c(avg_mape_arima, avg_mape_ets, avg_mape_tbats, avg_mape_theta, avg_mape_damped),
  sMAPE = c(avg_smape_arima, avg_smape_ets, avg_smape_tbats, avg_smape_theta, avg_smape_damped)
)

# Sort the data frame in ascending order based on both MAPE and sMAPE values
sorted_metrics <- evaluation_metrics_summary[order(evaluation_metrics_summary$MAPE, evaluation_metrics_summary$sMAPE), ]

# Print the sorted data frame

kableExtra::kbl(sorted_metrics, digits = 2, caption = "Error measures evaluating out-of-sample accuracy of the automatic models")|>
  kable_styling(
    position = "center",
    full_width = FALSE
  )
Table 12: Error measures evaluating out-of-sample accuracy of the automatic models
Method MAPE sMAPE
5 Damped Exponential Smoothing 4.67 4.54
2 ETS 5.14 4.94
4 Theta 5.54 5.29
1 ARIMA 5.68 5.37
3 TBATS 6.16 5.83
# Identify the row corresponding to the best model
best_model_row <- sorted_metrics[1, 1]

# Print the best model with highlighting
cat("Best model based on MAPE and sMAPE:", best_model_row)
Best model based on MAPE and sMAPE: Damped Exponential Smoothing

The analysis reveals that the Damped Exponential Smoothing method emerges as the best model, boasting the lowest MAPE and sMAPE values of 4.67% and 4.54%, respectively. This indicates that, on average, this method provides the most accurate forecasts compared to the others evaluated.

Conversely, the TBATS method exhibits the highest MAPE (6.16%) and sMAPE (5.83%) values, indicating the least accurate forecasting performance among all the evaluated methods. The TBATS model might perform worse due to factors such as potential overfitting, violation of model assumptions, and limitations in historical data. These issues can collectively hinder the model’s accuracy and effectiveness in forecasting.

Conclusions

This report has provided valuable insights into the performance and effectiveness of various forecasting methods for quarterly time series data.

Although the 1975 recession significantly impacted forecast accuracy, and the manual models struggled to anticipate the abrupt shifts in unemployment rates during this economic turmoil, the forecasts before the recession yielded promising results, with the regression model exhibiting superior accuracy.

The competitive edge of the Damped Exponential Smoothing model over automatic ARIMA, ETS, and TBATS models for batch forecasting can be attributed to the factor highlighted by Koning, Franses, Hibon and Stekler (2005) that the complexity of forecasting methods does not always correlate with forecast accuracy.

It is crucial to acknowledge the inherent limitations in forecasting, including the reliance on historical data and assumptions regarding stationarity and underlying patterns. Moving forward, continued research and experimentation are essential to refining forecasting methodologies and addressing the evolving challenges posed by dynamic and uncertain environments.

By leveraging the insights from this study, organisations can enhance their forecasting capabilities and make informed decisions to navigate the complexities of today’s business landscape effectively. By strategically utilising forecasting techniques, businesses can optimise resource allocation, minimise risks, and seize opportunities for growth and innovation.


References

Bank of Canada (1999) Canadian economic performance at the end of the twentieth century. Available at: https://www.bankofcanada.ca/1999/06/canadian-economic-performance-end-twentieth-century/ (Accessed: 29 March 2024).

Brozyna J., Mentel G., Szetela B. (2016), ‘Influence of double seasonality on economic forecasts on the example of energy demand’, Journal of International Studies, Vol. 9, No 3, pp. 9-20. Available at: https://doi.org/10.14254/2071-8330.2016/9-3/1.

Koning, A.J., Franses, P.H., Hibon, M. and Stekler, H.O. (2005) ‘The M3 competition: Statistical tests of the results’, International Journal of Forecasting, 21(3), pp.397–409. Available at: https://doi.org/10.1016/j.ijforecast.2004.10.003.

Makridakis, S. and Hibon, M. (2000) ‘The M3-Competition: results, conclusions and implications’, International Journal of Forecasting, 16(4), pp.451–476. Available at: https://doi.org/10.1016/s0169-2070(00)00057-1.

Appendix 1. Session Info

R version 4.5.2 (2025-10-31)
Platform: aarch64-apple-darwin20
Running under: macOS Tahoe 26.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Madrid
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] urca_1.3-4         tseries_0.10-58    scales_1.4.0       broom_1.0.8       
 [5] knitr_1.50         kableExtra_1.4.0   ggthemes_5.1.0     RColorBrewer_1.1-3
 [9] plotly_4.10.4      fable_0.4.1        feasts_0.4.2       fabletools_0.5.1  
[13] tsibbledata_0.4.1  tsibble_1.1.6      lubridate_1.9.4    tidyr_1.3.1       
[17] dplyr_1.1.4        tibble_3.2.1       fpp3_1.0.2         ggplot2_3.5.2     
[21] Mcomp_2.8          forecast_8.24.0   

loaded via a namespace (and not attached):
 [1] gtable_0.3.6         anytime_0.3.12       xfun_0.52           
 [4] bslib_0.9.0          htmlwidgets_1.6.4    lattice_0.22-7      
 [7] quadprog_1.5-8       vctrs_0.6.5          tools_4.5.2         
[10] generics_0.1.4       curl_6.2.3           parallel_4.5.2      
[13] xts_0.14.1           pkgconfig_2.0.3      data.table_1.17.4   
[16] distributional_0.5.0 lifecycle_1.0.4      stringr_1.5.1       
[19] compiler_4.5.2       farver_2.1.2         textshaping_1.0.1   
[22] htmltools_0.5.8.1    sass_0.4.10          lazyeval_0.2.2      
[25] yaml_2.3.10          pillar_1.10.2        crayon_1.5.3        
[28] jquerylib_0.1.4      ellipsis_0.3.2       cachem_1.1.0        
[31] nlme_3.1-168         fracdiff_1.5-3       tidyselect_1.2.1    
[34] digest_0.6.37        stringi_1.8.7        purrr_1.0.4         
[37] bookdown_0.43        labeling_0.4.3       fastmap_1.2.0       
[40] grid_4.5.2           colorspace_2.1-1     cli_3.6.5           
[43] magrittr_2.0.3       withr_3.0.2          backports_1.5.0     
[46] rappdirs_0.3.3       timechange_0.3.0     httr_1.4.7          
[49] TTR_0.24.4           rmarkdown_2.29       quantmod_0.4.28     
[52] nnet_7.3-20          timeDate_4041.110    zoo_1.8-14          
[55] evaluate_1.0.3       lmtest_0.9-40        viridisLite_0.4.2   
[58] rlang_1.1.6          Rcpp_1.0.14          glue_1.8.0          
[61] xml2_1.3.8           svglite_2.2.1        rstudioapi_0.17.1   
[64] jsonlite_2.0.0       R6_2.6.1             systemfonts_1.2.3